Memory-based active learning for French broadcast news

نویسندگان

  • Frédéric Tantini
  • Christophe Cerisara
  • Claire Gardent
چکیده

Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with selfand co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The need to create a media block for the convergence of overseas news networks

As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...

متن کامل

Accounting for Prosodic Information to Improve ASR-Based Topic Tracking for TV Broadcast News

The increasing quantity of video material available on line requires improved methods to help users navigate such data, among which are topic tracking techniques. The goal of this paper is to show that prosodic information can improve an ASRbased topic tracking system for French TV Broadcast News. To this end, two kinds of prosodic information — extracted with and without a learning phase — are...

متن کامل

French Broadcast News Transcription

We describe a French broadcast news transcription system created in the scope of the CIMWOS project [1]. We collected a corpus based on two French and one Belgian TV stations. This corpus forms the base of various system components, such as ASR and Speaker ID. We discuss a few problems posed to speech recognition by characteristics of the French language and approaches to solve them. Finally we...

متن کامل

Study of Numerical Processing Speed, Implicit and Explicit Memory, Active and Passive Memory, Conservation Abilities, and Visual-Spatial Skills of Students with Dyscalculia

Background and Purpose: Learning disorder is one of the common disorders in students, which can lead to the occurrence of educational problems and secondary disorders in them. Based on psychopathological criteria, dyscalculia is one of the subcategories of learning disorder. Children with this disorder have problems in perception of spatial relations and in different cognitive abilities. Theref...

متن کامل

VOXALEAD: A Scalable Video Search Engine Based On Content

Most news organizations provide immediate access to topical news broadcasts through RSS streams or podcasts. Until recently, applications have not permitted a user to perform content based search within a longer spoken broadcast to find the segment that might interest them. Recent progress in both automatic speech recognition (ASR) and natural language processing (NLP) has produced robust tools...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010